1 Help

Each function have a detailed help accessible in R via ?{funtion}.

2 Tests datasets

The dataset can be downloaded via this link.

This tutorial assume that you have extracted all the read file in a folder named reads along with the sample-metadata.csv file.

We share a 24 samples test dataset extract from rats feces at two different time (t0 & t50) and in two nutrition conditions. Also included two extraction control sample (blank).

sm <- read.table("sample_metadata.csv", sep="\t",header=TRUE)
DT::datatable(sm)
load("decontam_out/robjects.Rdata")

3 Processing of raw sequences

3.1 ASV definition with DADA2

The first step will be the creation of ASVs (Amplicon Sequence Variants) thanks to the dada2 package. In rANOMALY, only one function is needed to compute all the different steps require from this package.

Sample names will be extracted from the file name, so files must be formatted as followed : {sample-id1}_R1.fastq.gz {sample-id1}_R2.fastq.gz etc…

dada_res = dada2_fun(path="./reads", dadapool = "pseudo", compress=TRUE, plot=FALSE)

Main output: - read_tracking.csv that summarize the read number after each filtering step.

DT::datatable(read.table("dada2_out/read_tracking.csv",sep="\t",header=TRUE))

The sample names extracted from the file name. We consider as sample name anything that is before the first underscore. This must match the sample names that are in sample metadata files. input: raw read number. filtered: after dada2 filtering step: no N’s in sequence, low quality, and phiX. denoisedF & denoisedR: after denoising. Forward & Reverse. merged: after merging R1 & R2. nonchim: after chimeras filtering.

  • dada2_robjects.Rdata with raw ASV table and representative sequences in objects otu.table, seqtab.export & seqtab.nochim.
  • raw_asv-table.csv
  • rep-seqs.fna

3.2 Taxonomic assignment

This function uses IDTAXA function from DECIPHER package, and allows to use 2 differents databases. It keeps the best assignation on 2 criteria, resolution (depth) and confidence. The final taxonomy is validated by multiple ancestors taxa and incongruity correction step.

We share the latest databases we use in the IDTAXA format in this link. You can also generate your own database following those instructions and scripts we provide in another repository.

tax.table = assign_taxo_fun(dada_res = dada_res, id_db = c("path_to_your_banks/silva/SILVA_SSU_r132_March2018.RData","path_to_your_banks/DAIRYdb_v1.2.0_20190222_IDTAXA.RData") )

Main output: - taxo_robjects.Rdata with taxonomy in phyloseq format in tax.table object. - final_tax_table.csv the final assignation table that will be use in next steps. - allDB_tax_table.csv raw assignations from the two databases, mainly for debugging.

3.3 Phylogenetic Tree

The phylogenetic tree from the representative sequences is generated using phangorn and DECIPHER packages.

tree = generate_tree_fun(dada_res)

Main output: - tree_robjects.Rdata with phylogenetic tree object in phyloseq format.

3.4 Phyloseq object

To create a phyloseq object, we need to merge four objects and one file: - the asv table otu.table and the representative sequences seqtab.nochim from dada2_robjects.Rdata - a taxonomy table taxo_robjects.Rdata from taxo_robjects.Rdata - the phylogenetic tree tree from tree_robjects.Rdata - metadata from sample-metadata.csv

data = generate_phyloseq_fun(dada_res = dada_res, taxtable = tax.table, tree = tree, metadata = "./sample_metadata.csv")

Main output: - robjects.Rdata with phyloseq object in data for raw counts and data_rel for relative abundance.

3.5 Decontamination

The decontam_fun function uses decontam R package with control samples to filter contaminants. The decontam package offers two main methods, frequency and prevalence (and then you can combine those methods). For frequency method, it is mandatory to have the dna concentration of each sample in phyloseq (and hence in the sample-metadata.csv). “In this method, the distribution of the frequency of each sequence feature as a function of the input DNA concentration is used to identify contaminants.” In the prevalence methods no need of DNA quantification. “In this method, the prevalence (presence/absence across samples) of each sequence feature in true positive samples is compared to the prevalence in negative controls to identify contaminants.

Tips: sequencing plateforms often quantify the DNA before sequencing, but do not automaticaly give the information. Just ask for it ;).

Our function integrates the basics ASV frequency (nb_reads_ASV/nb_total_reads) and prevalence (nb_sample_ASV/nb_total_sample) filtering. As in our lab we had a known recurrent contaminant we included an option to filter out ASV based on they taxa names.

data = decontam_fun(data = data, domain = "Bacteria", column = "type", ctrl_identifier = "control", spl_identifier = "sample", number = 100)

Main output: - robjects.Rdata with contaminant filtered phyloseq object named data. - Exclu_out.csv list of filtered ASVs for each filtering step. - Kronas before and after filtering. - raw_asv-table.csv & relative_asv-table.csv. - venndiag_filtering.png.

venndiag

4 Plots, diversity and statistics

!!! We are currently developping a ShinyApp to visualize your data, sub-select your samples/taxons and do all those analyses interactively !!! ExploreMetabar

4.1 Rarefaction curves

In order to observe the sampling depth of each samples we start by plotting rarefactions curves. Those plots are generated by Plotly which makes the plots interactive.

rareplot = rarefaction(data, "souche_temps", 100 )
htmltools::tagList(list(rareplot))

4.2 Composition plots

Composition plots reveals here the top 10 genus present in our samples. #TODO Ord1 option control the… Fact1 option control the…

4.2.1 Raw abundance

bars_fun(data = data, top = 10, Ord1 = "souche_temps", Fact1 = "souche_temps", rank="Genus", relative = FALSE)

4.2.2 Relative abundance

bars_fun(data = data, top = 10, Ord1 = "souche_temps", Fact1 = "souche_temps", rank="Genus", relative = TRUE)

4.3 Diversity analyses

4.3.1 Alpha diversity

This function computes various alpha diversity indexes and returns: - a boxplot comparing conditions. - a table of values - an ANOVA analysis - a wilcox result test comparing conditions and giving the significativity of the observed differences. - a mixture model if your data include repetition in sampling. All this in a single function.

alpha <- diversity_alpha_fun(data = data, output = "./plot_div_alpha/", column1 = "souche", column2 = "temps", column3 = "", supcovs = "", measures = c("Observed", "Shannon") )

4.3.2 Table of diversity index values

The table of values for each indices you choose to compute.

pander(alpha$alphatable, style='rmarkdown')
  Observed Shannon
SB1.Sauv0 41 1.477
SB10.Mut0 40 2.073
SB11.Mut0 51 2.178
SB12.Mut0 38 2.116
SB13.Sauv50 46 2.691
SB14.Sauv50 57 2.905
SB15.Sauv50 50 2.793
SB16.Sauv50 52 2.8
SB17.Sauv50 49 2.624
SB18.Sauv50 54 2.831
SB19.Mut50 66 2.638
SB2.Sauv0 26 2.099
SB20.Mut50 72 2.721
SB21.Mut50 79 3.062
SB22.Mut50 81 2.81
SB23.Mut50 84 3.175
SB24.Mut50 90 3.148
SB3.Sauv0 19 0.1962
SB4.Sauv0 41 2.52
SB5.Sauv0 46 1.923
SB6.Sauv0 46 1.067
SB7.Mut0 33 2.256
SB8.Mut0 58 2.089
SB9.Mut0 50 2.237

4.3.3 Boxplots

The boxplots of those values.

ggplotly(alpha$plot) %>%
  layout(boxmode = "group")
## Warning: 'layout' objects don't have these attributes: 'boxmode'
## Valid attributes include:
## 'font', 'title', 'uniformtext', 'autosize', 'width', 'height', 'margin', 'paper_bgcolor', 'plot_bgcolor', 'separators', 'hidesources', 'showlegend', 'colorway', 'datarevision', 'uirevision', 'editrevision', 'selectionrevision', 'template', 'modebar', 'meta', 'transition', '_deprecated', 'clickmode', 'dragmode', 'hovermode', 'hoverdistance', 'spikedistance', 'hoverlabel', 'selectdirection', 'grid', 'calendar', 'xaxis', 'yaxis', 'ternary', 'scene', 'geo', 'mapbox', 'polar', 'radialaxis', 'angularaxis', 'direction', 'orientation', 'editType', 'legend', 'annotations', 'shapes', 'images', 'updatemenus', 'sliders', 'colorscale', 'coloraxis', 'metasrc', 'barmode', 'bargap', 'mapType'

4.3.4 Tests on Observed index

4.3.4.1 ANOVA results

For each indices, you have access to the ANOVA test. Here we present the result for the “Observed” indice.

pander(alpha$Observed$anova)
Analysis of Variance Model
  Df Sum Sq Mean Sq F value Pr(>F)
Depth 1 49.36 49.36 0.5091 0.4838
souche 1 1877 1877 19.36 0.0002764
temps 1 3649 3649 37.64 5.392e-06
Residuals 20 1939 96.96 NA NA

4.3.4.2 Wilcox test

Wilcox tests are made on each factor you have entered, and the combination of the two. Here “souche” and “temps”.

4.3.4.3 Wilcox test for “souche” factor

pander(alpha$Observed$wilcox_col1)
  mutant
sauvage 0.043

4.3.4.4 Wilcox test for “temps” factor FDR corrected.

pander(alpha$Observed$wilcox_col2_fdr)
  t0
t50 0.001

4.3.4.5 Wilcox test for the collapsed factors

pander(alpha$Observed$wilcox_col2_collapsed)
  mutant_t0 mutant_t50 sauvage_t0
mutant_t50 0.002 NA NA
sauvage_t0 0.377 0.005 NA
sauvage_t50 0.336 0.002 0.008

4.4 Beta diversity

# beta <- diversity_beta_light(data = data, output = "./plot_div_beta/", glom = "ASV", column1 = "temps", column2 = "souche", covar ="")
beta = diversity_beta_light(data, col = "souche", cov="temps", dist0 = "bray", ord0 = "MDS", output="./plot_div_beta/", tests = TRUE)
beta2 = diversity_beta_light(data, col = "souche_temps", dist0 = "bray", ord0 = "MDS", output="./plot_div_beta/", tests = TRUE)

4.4.1 Ordination plot

ggplotly(beta$plot)
ggplotly(beta2$plot)

4.4.2 Statistical tests

  • permanova
beta$permanova
## 
## Call:
## adonis(formula = as.formula(paste("dist1 ~ Depth +", paste(cov1,      collapse = "+"), "+", col)), data = mdata, permutations = 1000) 
## 
## Permutation: free
## Number of permutations: 1000
## 
## Terms added sequentially (first to last)
## 
##           Df SumsOfSqs MeanSqs F.Model      R2   Pr(>F)    
## Depth      1    0.5075 0.50751  3.1218 0.06845 0.013986 *  
## temps      1    2.1846 2.18458 13.4380 0.29463 0.000999 ***
## souche     1    1.4711 1.47112  9.0493 0.19841 0.000999 ***
## Residuals 20    3.2514 0.16257         0.43851             
## Total     23    7.4146                 1.00000             
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • pairwise permanova on concatenated factors
pander(beta2$pairwisepermanova)
Table continues below
pairs Df SumsOfSqs F.Model R2 p.value
sauvage_t0 vs mutant_t0 1 0.9528 4.64 0.317 0.007
sauvage_t0 vs sauvage_t50 1 2.021 28.97 0.7434 0.004
sauvage_t0 vs mutant_t50 1 2.197 26 0.7223 0.003
mutant_t0 vs sauvage_t50 1 1.681 11.59 0.5369 0.007
mutant_t0 vs mutant_t50 1 1.57 9.826 0.4956 0.003
sauvage_t50 vs mutant_t50 1 1.818 75.22 0.8827 0.003
p.adjusted sig
0.007 *
0.006 *
0.006 *
0.007 *
0.006 *
0.006 *

4.5 Differential analyses

4.5.1 Metacoder

out1 = metacoder_fun(data = data, output = "./metacoder", column1 = "souche_temps", rank = "Family", signif = TRUE, plottrees = TRUE, min ="10", comp = "sauvage_t50~mutant_t50,sauvage_t0~mutant_t0")
  • Table
 DT::datatable(out1$table, filter = "top", options = list(scrollX = TRUE))
  • Comparaison 1
 out1$sauvage_t0_vs_mutant_t0$plot

  • Comparaison 2
 out1$sauvage_t50_vs_mutant_t50$plot

4.5.2 DESeq2

#fig.keep='all', fig.align='left', fig.width = 15, fig.height = 10
out2 = deseq2_fun(data = data, output = "./deseq/", column1 = "souche_temps", verbose = 1, rank = "Family", comp = "sauvage_t50~mutant_t50,sauvage_t0~mutant_t0")
ggplotly(out2$sauvage_t50_vs_mutant_t50$plot)
DT::datatable(out2$sauvage_t50_vs_mutant_t50$table, filter = "top", options = list(scrollX = TRUE))

4.5.3 MetagenomeSeq

out3 = metagenomeseq_fun(data = data, output = "./metagenomeseq/", column1 = "souche_temps", verbose = 1, rank = "Family", comp = "sauvage_t50~mutant_t50,sauvage_t0~mutant_t0")
ggplotly(out3$sauvage_t50_vs_mutant_t50$plot)
DT::datatable(out3$sauvage_t50_vs_mutant_t50$table, filter = "top", options = list(scrollX = TRUE))

4.5.4 Aggregate methods

resF = aggregate_fun(data = data, metacoder = "./metacoder/metacoder_signif_Family.csv", deseq = "./deseq/", mgseq = "./metagenomeseq/", output = "./aggregate_diff/", column1 = "souche_temps", column2 = NULL, verbose = 1, rank = "Genus", comp = "sauvage_t50~mutant_t50,sauvage_t0~mutant_t0")
ggplotly(resF$sauvage_t0_vs_mutant_t0$plot)
ggplotly(resF$sauvage_t50_vs_mutant_t50$plot)
DT::datatable(resF$table, filter = "top", options = list(scrollX = TRUE))